Combinatorial Multi-armed Bandits for Real-Time Strategy Games
نویسنده
چکیده
Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called näıve sampling, based on a variant of the Multiarmed Bandit problem called Combinatorial Multi-armed Bandits (CMAB). We analyze the theoretical properties of several variants of näıve sampling, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, näıve sampling outperforms the other sampling strategies.
منابع مشابه
The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games
Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called Naı̈ve Sampling, based on a variant of the Multi-armed Bandit problem called the Combinatorial Multi-armed Bandit (CMAB) problem. We present a new MCTS algorithm based on Naı̈ve Sampling call...
متن کاملSemi-Bandits with Knapsacks
We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks and combinatorial semi-bandits. The former concerns limited “resources” consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a huge number of actions but assumes combinatorial structure and additional feedback to make the problem tractable. We define a common generalization, supp...
متن کاملPlaying in stochastic environment: from multi-armed bandits to two-player games
Given a zero-sum infinite game we examine the question if players have optimal memoryless deterministic strategies. It turns out that under some general conditions the problem for twoplayer games can be reduced to the same problem for one-player games which in turn can be reduced to a simpler related problem for multi-armed bandits. Digital Object Identifier 10.4230/LIPIcs.FSTTCS.2010.65
متن کاملRegret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits
I prove near-optimal frequentist regret guarantees for the finite-horizon Gittins index strategy for multi-armed bandits with Gaussian noise and prior. Along the way I derive finite-time bounds on the Gittins index that are asymptotically exact and may be of independent interest. I also discuss computational issues and present experimental results suggesting that a particular version of the Git...
متن کاملOn the optimality of the Gittins index rule for multi-armed bandits with multiple plays
We investigate the general multi-armed bandit problem with multiple servers. We determine a condition on the reward processes su1⁄2cient to guarantee the optimality of the strategy that operates at each instant of time the projects with the highest Gittins indices. We call this strategy the Gittins index rule for multi-armed bandits with multiple plays, or brie ̄y the Gittins index rule. We show...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Artif. Intell. Res.
دوره 58 شماره
صفحات -
تاریخ انتشار 2017